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Abstract 

A method for integrating separately developed information resources that overcomes incompatibilities in syntax and 
semantics and permits the resources to be accessed and modified coherently is described. The method provides logical 
connectivity among the information resources via a semantic service layer that automates the maintenance of data 
integrity and provides an approximation of global data integration across systems. This layer is a fundamental part of the 
Carnot architecture, which provides tools for interoperability across global enterprises 
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Resource Integration 
Using a Large Knowledge 
Base in Carnot 



Christine Collet,' Michael N. Huhns, and Wei-Min Shen 
Microelectronics and Computer Technology Corporation 




This method for 
integrating separately 
developed information 
resources permits 
access and coherent 
modification. It uses 
the Cyc knowledge base 

as a global schema to 
resolve inconsistencies. 



||pj oday's corporate computing environments have many independent infor- 
mation resources. Because they must serve the needs of various applica- 
tions, the resources might be of different types — for instance, a database 
management system with its databases, an information repository, an expert 
system with its knowledge base, or an application program with its data and 
productions. 

These resources are largely incompatible in syntax and semantics, due not only 
to their different types, but also to diverse hardware and operating-system soft- 
ware, various physical and logical data structures, and contrasting corporate uses. 
Information resources attempt to model some portion of the real world, and in this 
attempt necessarily introduce simplifications and inaccuracies that lead to incom- 
patibilities. 

The goal of the research we describe in this article has been to develop a method 
for integrating separately developed information resources that overcomes these 
incompatibilities and permits the resources to be accessed and modified coherent- 
ly. The method provides logical connectivity among the information resources via 
a semantic service layer that automates the maintenance of data integrity and 
provides an approximation of global data integration across systems. This layer is 
a fundamental part of the Carnot architecture, 1 which provides tools for interop- 
erability across global enterprises. 

The need for this capability is critical. Strategic business applications that 
require intercorporate linkage (for example, linking buyers with suppliers) or 
intracorporate integration (for example, producing composite information from 
engineering and manufacturing views of a product) are becoming increasingly 
prevalent. But creating such an environment requires that the incompatibilities, 
arising during query, update, and maintenance operations, be resolved.-' 



' Since collaborating on thi$ article. Collet has accepted a professorship in France. 
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Two approaches 
to integrated 
access 
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There are two general ap- 
proaches for providing inte- 
grated access to a collection 
of heterogeneous databases. 3 
They are called the compos- 
ite approach and the fed- 
erated or multidaiabuse ap- 
proach. 

The composite approach 
introduces a global schema 
to describe the information 
in the databases being com- 
posed. Database access and 
manipulation operations are 
expressed in a universal lan- 
guage and then mediated 
through the global schema. 4 
Through this schema, users 
and applications are present- 
ed with the illusion of a sin- 
gle, centralized database. 
They need not be aware of 
semantic conflicts among the 
databases because explicit 
resolutions for the conflicts 
are specified in advance. 
However, the centralized 
view may be very different 
from the previous local views, 
so that existing applications 
might no longer execute correctly. Fur- 
ther, constructing a global schema is not 
only difficult but also must be repeated 
every time a local schema changes or is 
added. 

The federated 5 -* or multidatabase 7 ap- 
proach avoids construct ng a global sche- 
ma and merely presents a user with a 
collection of local schemas, along with 
tools for information sharing among the 
databases. The user resolves conflicts 
of facts in a manner particular to each 
application and integrates only the nec- 
essary portions of the databases. The 
advantages cited* 7 for this approach in- 
clude increased security, easier mainte- 
nance, and the ability to deal with in- 
consistent databases. 

However, a user or application must 
understand the contents of each data- 
base to know what to include in a query: 
there is no global schema to provide 
advice about semantics. I n addition, each 
database must maintain knowledge 
ahout the other databases with which it 
shares information. In Ahlsen and Jo- 





Local-to-globaJ semantic translation 
by articulation axioms 
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Figure 1. Global and local views in semantic transaction 
processing. 



though not with each other, 
makinga global schema much 
easier to construct and main- 
tain. 

The Cyc knowledge base is 
the best available candidate 
for a global schema because 
of 

• its large size (about 
50,000 entities and rela- 
tionships expressed as 
frames and slots), which 
covers a large portion 
of the real world and the 
subject matter of most in- 
formation resources; 

• its rich set of abstraction 
mechnisms. which ease 
the process of represent- 
ing predefined groupings 
of concepts; 

• its knowledge repre- 
sentation and inference 
mechanisms, which are 
needed to express the 
relationships among in- 
formation resources and 
to construct, represent, 
and maintain a global 
schema; and 

• its typing mechanism, 
which is used to integrate 
and check the consisten- 
cy of query results. 



hannesson. 8 this knowledge takes the 
form of models of the other databases, 
partial global schemas, a common data 
model, and an explicit agreement with 
each of the other databases. The num- 
ber of local agreements and partial glo- 
bal schemas may be as high as N(N- 1), 
where N is the number of databases. By 
contrast, in the composite approach, 
only N mappings are required to trans- 
late between N databases and a global 
schema. 

Our methodology for semantic inte- 
gration is based on the composite ap- 
proach., but our implementation differs 
in three ways, enabling us to combine 
the advantages of both approaches while 
avoiding some of their shortcomings. 

First, rather than redo the global sche- 
ma each time a new resource is to be 
integrated or a previously integrated 
resource is altered, we use an existing 
global schema — the Cyc knowledge 
base* The schemas of individual re- 
sources are independently compared and 
merged with this knowledge base, al- 



Second, unlike most previous work 
on database schema integration, we use 
not only a structural description of the 
local schemas in resolving semantic dif- 
ferences but also all available knowl- 
edge, including 

• schema knowledge, that is, the struc- 
ture of the data, integrity constraints, 
and allowed operations; 

•resource knowledge, that is. a de- 
scription of supported services, such 
as the data model and languages, 
lexical definitions of object names, 
the data itself, comments from the 
resource designer, and guidance 
from the integrator; and 

• organization knowledge, that is, the 
corporate rules governing use of the 
resource. 

Third, the mapping between each in- 
dividual information resource and the 
global schema is accomplished by a set 
of articulation axioms: statements of 
equivalence between componentsof two 
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theories. 10 The axioms pro- 
vide not only a semantic map- 
ping between resources, but 
also a means of translation 
that enables the maintenance 
of a global view of all infor- 
mation resources and, at the 
same time, local views that 
correspond to each individu- 
al resource. An application 
can retain its current view but 
take advantage of some of the 
extra information that be- 
comes available as informa- 
tion resources are integrated. 
Of course, any application can 
be modified to use the global 
view directly to access all 
available information. 

The key aspects of our 
method are thus an ability to 
represent and integrate the 
semantics of individual infor- 
mation resources precisely, 
and an ability to maintain both 
global and local views. We 
describe an evaluation of our 
method based on the integra- 
tion of three databases that 
have different data models 
(entity-relationship, relation- 
al, and object-oriented) but 
similar semantics for their 
data (that is, they capture in- 
formation about the same do- 
main). We also describe how 
the global and local views are 
used in semantic transaction 
processing. 



Semantic 
transactions with 
global and local 
views 



Relations 


Columns 


AAAInfo 


name* address rateCode 




lodgingType phone facility 


AAADirection 


address* direction 


AAACredit 


name* creditCard* 


AAARate 


name* season* 1P2P1B 2P2B 




XPfCode 



Figure 2. A relational database schema for the 
AAA tour book database. 



Object 



subclasses: 



Fodorlnlo 




FodorAddress 



Fodor Phone 



phoneNum: 



FodorFacility 



facilityCode: (TV, pool, restaurant, bar) 



Figure 3. The object-oriented schema for the Fodor 
database. 




Figure 4. The entity-relationship schema for the Mass 
database. 



Resource integration is achieved by 
separate mappings between each infor- 
mation resource and the global schema 
(see Figure 1 ). Each mapping consists 
of a syntax translation and a semantics 
translation. The syntax translation pro- 
vides a bidirectional translation between 
a local data manipulation language 
(DML,) and the global context language 
(GCL), which is based on extended first- 
order logic. The semantics translation is 
a mapping between two expressions in 
GCL that have equivalent meanings. 
This is accomplished by a set of logical 



equivalences in GCL, called articula- 
tion axioms, having the form 

where $ and \y are logical expressions 
and ist is a predicate that means "is true 
in the context." This axiom says that the 
meaning of $ in the global schema G is 
equivalent to the meaning of \f in the 
local schema S,. At most, n sets of artic- 
ulation axioms are needed to integrate 
n resources. 
After integration, one can use the 



information that becomes 
available through a global 
view or a local view. The glo- 
bal view presents users and 
applications with the illusion 
of a single information re- 
source, but they must use 
GCL. which might be unfa- 
miliar. 

The other option is the lo- 
cal view. Queries and updates 
can be issued against the local 
view, but they are not sent to 
any particular resource. Rath- 
er, they are first translated 
into the global language with 
terms that have global mean- 
ings, and then they are trans- 
lated into different DML, and 
distributed to appropriate in- 
formation resources. The lo- 
cal view has the advantage that 
previous user knowledge and 
application programs do not 
need to be modified to access 
the extra but relevant infor- 
mation that becomes avail- 
able. Note that external sche- 
mas previously defined on 
these local views remain in- 
tact, or new external schemas 
can be defined. 

To illustrate the idea, we 
describe how transactions 
are processed semantically 
through the global and local 
views of three integrated da- 
tabases." The three databas- 
es have the same domain, that 
is, each contains information 
about hotels. However, they 
use different data models. The 
AAA database uses the rela- 
tional model (see Figure 2). 
A hotel is called AAAInfo 
and represented as a relation. 
The features of hotel, such as 
name and address, are repre- 
sented as columns. 

The Fodor database uses an object- 
oriented data model (see Figure 3). A 
hotel is represented as a class called 
Fodorlnfo. The features of hotel are 
represented as fields of the class or as 
other object classes pointed at by 
Fodorlnfo. 

The Mass database has an entity- 
relationship data model (see Figure 4). 
A hotel is represented as an entity called 
Masslnfo, and its features are repre- 
sented as attributes of this entity or as 
relationships to other entities. Note that 
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these schemas represent different per- 
spectives and different information about 
hotels. 

Some of the Cyc concepts used in inte- 
grating these databases are the collec- 
tions Lodging and Restaurant, and the 
predicates hasAmenities,* phoneNum- 
ber, and instanceOf. Example articu- 
lation axioms (hat map between the 
three database schemas and the global 
schema are 

ist(G instanceOf{lH Lodging)) <=> 
ist(AAA instanceOf 
(?// AAAInfo)) 
ist(G phoneNumber(lH IP)) <=> 

ist(AAA phone(7H ?/>)) 
ist(G hasAmenities{lH IF)) <=> 

ist{AAA facility{lH IF)) 
ist(G instanceOftfH Lodging)) o 
ist(Fodor inslahceOf(l H 
Fodorlnfo)) 
ist(G hasAmenities^. H ?F)) <=> 
ist(Fodor facility (?/7 IX) A 
facility Code(1X ?F)) 
ist(G instanceOf^ A Restaurant)) <=> 
ist(Mass instanceOf 
(?A Amenitylnfo) A 
amenityCode (1A 4)) 

Based on its local view of the AAA 
database, an application might issue the 
following query for the phone numbers 
of hotels that have a restaurant: 

SELECT phone FROM AAAInfo 
WHERE facility = "Restaurant" 

This local Structured Query Language 
query is first translated into GCL by the 
SQL-GCL syntax translator: 

instanceOf(?L AAAInfo) A 
instanceOf(?R Restaurant) A 
facility(?L ?R) A phone(?L ?P) 

This expression is then mapped by artic- 
ulation axioms into a new expression 
whose semantics is meaningful in the 
global schema: 

instance Of (?L Lodging) A 
instanceOf(?R Restaurant) A 
hasAmenities(?L ?R) A 
phoneNumber(?L ?P) 



* In this aniclc, the names of entities, objects, 
classes, relations, relationships, and Cyc collec- 
tions axe captialized, and the names of attributes, 
fields, and Cyc slots are not capitalized. 



This is then translated into different lo- 
cal queries using the appropriate articu- 
lation axioms in reversed The translation 
for the Fodor local schema is 

instanceOf(?L Fodorlnfo) A 
facilities(?L ?F) A 
facilityCode(?F?R) A 
instanceOf(?R Restaurant) A 
phone(?L ?P) A 
phoneNum(?P ?N) 

The translation for the Mass local sche- 
ma is 

instanceOf(?L Masslnfo) A 

inAmenityRelationship(?L ?A) A 
instanceOf 

(?A AmenityRelationship) A 

involvesAmenities 

(?A ?F) A 

instanceOf 

(?F Amenitylnfo) A 

amenityCode(?F4) A 

phone(?L ?N) 

These queries are then translated syn- 
tactically to appropriate local data ma- 
nipulation languages before being sent 
to the databases. For example, the query 
sent to the Fodor database is the follow- 
ing object-oriented expression (using Itas- 
ca syntax): 

(SELECT (Fodorlnfo phones 
phoneNum) 
(= (path (some facilities) 
(some facilityCode)) 
"Restaurant")) 

The query sent to the Mass database is an 
SQL-like entity-relationship expression: 

SELECT phone FROM Masslnfo, 
AmenityRelationship, 
Amenitylnfo 
WHERE Masslnfo. 
AmenityRelationship. 
amenityCode = 4 

The SQL query sent to the AAA data- 
base is not generated from the global 
schema because it is the same as the 
original SQL query. 

After the transactions are executed, 
the distributor assembles the results in 
the local view. In this example, the result 
is a column of phone numbers, because 
the local view of the AAA database is in 
the relational model. These phone num- 
bers come from three databases and the 
list can be much longer than that from 



the AAA database alone. However, us- 
ers and applications need not be aware 
of the extra sources of information. 

The same process is used if a query is 
asked against the global view. In this 
case, the query is translated into three 
queries (hat are distributed to the three 
databases, and the results are in logic 
form (because the global view is de- 
scribed by first-order logic). 

The global view is implemented as a 
combination of a transaction generator, 
a transaction distributor, and a result 
assembler. The transaction distributor 
is written in the actor language Rosette, 
which provides concurrency and asyn- 
chrony. The transactions are distribut- 
ed to the resources, along with depen- 
dency properties among updated data 
and consistency requirements for these 
data (represented by eventual consis- 
tency, periodic consistency, or lagging 
consistency properties). We consider the 
execution of updates an important issue 
of schema integrity. 

The development of 
articulation axioms 

Articulation axioms for an informa- 
tion resource are developed in a three- 
phase process: 

(1) schema representation, 

(2) concept matching, and 

(3) axiom construction. 

The schema representation phase pro- 
duces a Cyc context (microtheory 10 ) 
containing a model for an information 
resource. In the second phase, concepts 
from the model are matched with ap- 
propriate concepts in Cyc's base con- 
text, the global schema (see Table 1). If 
there are no frames in the global sche- 
ma corresponding to ones in the local 
schema, they are created. Matching is 
thus an interactive process. The user 
also might have to augment the model 
with additional properties (semantics) 
about the local schema for the matching 
phase to be completed. In the third phase, 
the matches are converted automatical- 
ly into articulation axioms by instanti- 
ating templates for these axioms with 
terms from the matches. 

Schema representation. The schema 
representation phase of the integration 
methodology represents the schema of 
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an information resource in the 
formalism of the global sche- 
ma (see Table I). The repre- 
sentation consists of a set of 
Cyc frames with slots residing 
in a separate context created 
for each information resource. 
These frames are instances of 
more general frames describ- 
ing the data model used by the 
schema. For example, if we 
represent an instance of a re- 
lational schema, we will have 
frames of types Relation and 
DatabaseAttribute. The rep- 
resentation is structured us- 
ing the properties described 
in the table in the "Semantics 
of resources*' sidebar. 

We define three types of 
frames for representing sche- 
mas: 



Table I. Structures for representing database 
components. 



L/dld 1V1UUCI V^UJIUCpi 






Structure 


Object class 


fnll^rtinn 

VUllCtUUIl 


Object instance 


T rwiiviHiial 
1 11UI v luuai 


Kip ntfriKittr* 
V_JUjCLl allllUUlC 


Slot 


Object method 




nnu ly-reidiionbnip cnuiy 


V^UllCL UU1I 


P nt i t\i-Yt*\n f winch in r**la t inn^h in 


Collection 


Fnttiv»rplafinnchiri ntirihntf* 
C 11 ll \y I Cm 11 kjI i jiup din luuit 


Slot 


IxvlalllHlal lauic 


Collection 


Relational attribute 


Slot 


Relational tuple 


Individual 


Hierarchical segment 


Collection 


Hierarchical record 


Individual 


Hierarchical field 


Slot 


Codasyl record type 


Collection 


Codasyl record 


Individual 


Codasyl data item (Field) 


Slot 


Codasyl set (Link) 


Slot 



► DatabaseSchema frames, 
describing the schemas for 
different data models; 

» DatabaseComponent frames, de- 
scribing the major components of 
schemas, such as relations and enti- 
ties; and 



• Database!. ink frames, describing 
different kinds of links used to re- 
fine and relate the major compo- 
nents. 



Every schema and every 
one of its components 
(relation.entit>\attribute,etc.) 
is an instanceOf these types 
and belongs to a context char- 
acterizing that schema. For 
example, the schema Mass is 
represented as an instance of 
ERSchema in the context 
Mass, and every object of 
Mass is defined in this context. 
The slot dBSchemaMt, which 
is defined for the frame 
DatabaseSchema. is used to 
express the relationship be- 
tween an instance of a schema 
and its context. Information 
about the use of a resource 
and the different function- 
alities (data definition language 
or DDL, DML, transactions) 
it provides are represented with 
the same approach as for 
schema representation, that 
is, using frames such as 
RehitionalService, LRService, 
RelationalDDLType, and 
ERTransactionType. 

Matching. The matching phase of in- 
tegration can be considered the dual 



Semantics of resources 



The semantics of a resource can be obeyed and used more precisely if 
properties in addition to those contained in its schema can be specified. 
We encode in a knowledge base properties for specifying the semantics of 
both individual and collective resources. 

the properties, shown In the accompanying table, provide a rich model 
for a resource and its components. A resource is viewed as a set of ob- 
jects, along with the services to manipulate the set and the rules for their 
use within an organization. Objects can be concepts, models, data, integri- 
ty constraints, application programs, etc. The properties characterize con- 
cepts at the schema level and at the data or value level. They are instanti- 
ated during schema representation and are used during the subsequent 
matching phase of axiom development. 

Little work in schema integration has been done on specifying the se- 
mantics of services and organizations to facilitate query decomposition, 
query optimization, and transaction management. Most work is based on 
the semantics of objects, primarily considering relationships among enti- 
ties and attributes. Some of the relationships have been specified auto- 
matically by using heuristics 1 or applying subsumptlon. 2 
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Properties for representing 


resource semantics. 




Property 


Applies To 


Name 


Schema object 


Domain 


Schema object 


Format 


Schema object 


Makes-sense-tor 


Schema object 


Documentation 


Schema object 


Integrity constraint 


Schema object 


Validation 


Schema object 


Synonym/homonym/ 




antonym 


Schema object 


Consistency 


Schema object 


Default value 


Value object 


Maximum value 


Value object 


Precision 


Value object 


Certainty 


Value object 


Name 


Service object 


Domain 


Service object 


Integrity constraints 


Service object 


Name 


Organization object 


Domain 


Organization object 


Availability 


Organization object 
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(e) 



Figure 5. Possible representations in Cyc and resource schemas for the same 
concept, each allowing a different aspect of it to be emphasized: (a) Cyc con- 
cept represented as a slot; (b) Schema concept represented as an attribute; 
(c) Cyc concept represented as a category; (d) Schema concept represented 
us a class; and (e) Schema concept represented as a relationship. 



problem to conceptual modeling for 
resource design (or to knowledge rep- 
resentation for expert system design). 
In conceptual modeling, the problem is, 
given a concept from the real world, 
find its model and representation. In 
resource integration, the problem is. 
given a (Cyc) representation for a con- 
cept, find its corresponding concept in 
the global schema. Several factors af- 
fect this phase. There might be a mis- 
match between the local andglobal sche- 
mas in the depth of knowledge 
representing a concept, and there might 
be mismatches between the structures 
used to encode the knowledge, as the 
example in Figure 5 shows. Specifically, 
matching is affected by 

• Concept representation in the local 
schema. The local schema might use 
one of three different structures to rep- 
resent a concept, corresponding to the 
primary structures of the relational, 
network or hierarchical, object-orient- 
ed, and entity-relationship data mod- 
els. 

• Concept representation in the global 
schema. The global schema might use 



different structures to represent a con- 
cept. In Cyc, a concept can be repre- 
sented as either a category (a collec- 
tion) or an attribute (sec Lcnat and 
Guha, 9 p. 339). 

• Relative knowledge. The global sche- 
ma might have more, less, or equivalent 
knowledge compared to a local schema. 
This factor applies to each concept in 
the local schema, rather than to the 
local schema as a whole. 

If the global schema's knowledge is 
more than or equivalent to that of the 
local schema for some concept, the in- 
teractive matching process described in 
this section will find the relevant por- 
tion of the global schema's knowledge. 
This knowledge will be in one of Cyc's 
two forms for concept representation. 
If the global schema has less knowledge 
than the local schema, then knowledge 
will be added to the global schema until 
its knowledge equals or exceeds that in 
the local schema; otherwise, the global 
schema would be unable to model the 
semantics of the resource. The added 
knowledge refines the global schema. 

Finding correspondences between 



concepts in the local and global schemas 
is a subgraph-matching problem. We base 
subgraph matching on simple string 
matching between the namesorsynonyms 
of frames representing the database sche- 
ma and thenames or synonyms of frames 
in the global schema. Matching begins 
by finding associations between attribute/ 
link definitions and existing slots in the 
global schema. For example, matching 
the attribute definition numbcrOf Rooms 
in the Mass context results in an associ- 
ation with the existing slot numberOf- 
Rooms in the Cyc global context. 

After a few matches have been identi- 
fied, either by exact string matches or by 
a user indicating the correct match out of 
a set of candidate matches, possible 
matches for the remaining schema con- 
cepts are greatly constrained (see the 
"Concept matching" sidebar). Converse- 
ly, after integrating an entity or an ob- 
ject, possible matches for its attributes 
are greatly constrained. 

Unfortunately, string matching on 
names and synonyms is too weak a 
method for suggesting candidate match- 
es. We are extending our matchingmech- 
anism to include other properties of the 
concept. For example, consider the 
integration of the attribute "other." The 
value of this attribute defines a 
description or a comment for an entity 
Masslnfo. so its semantics are similar to 
the semantics of the Cyc slot "english." 

The only means we see for finding 
such a similar slot is by 

(1) accessing the value of the slot 
entrylsA of "other" to find its 
domain C. 

(2) finding and listing all of the frames 
that have C as the value of their 
slot entrylsA, and 

(3) askingthe administrator tochoose 
one from the list. 

By considering additional properties for 
"other" given during the schema repre- 
sentation phase, such as its documenta- 
tion, we can shrink the list of frames 
suggested for the concept. 

Constructing articulation axioms. An 
articulation axiom isconstrucled foreach 
match found. For example, the match 
that is found between the attribute 
numberOfRooms and the Cyc slot 
numberOfRooms results in the axiom 

ist (C numberOfRooms(?L ?/V)) <^> 
ist{Muss numberOfRooms^ L ?Ar)) 
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meaning the numberOfRooms attri- 
bute definition determines the 
numberOfRooms slot in the global 
schema and vice versa. Articulation ax- 
ioms are constructed automatically by 
using the matches to instantiate tem- 
plates for the axioms, such as the tem- 
plates shown in Table 2. 



We have described an experi- 
ment in resource integration 
that we are conducting using 
the large knowledge base Cyc. Integra- 
tion of resource schemas is based on 
articulation axioms defined Between two 
contexts: a resource schema context and 
a global schema context provided by 
the Cyc knowledge base. 

Our methodology is based on the fol- 
lowing principles: 

* Existing data should not have to 
migrate or be modified to achieve inte- 
gration. 

• Existing applications should not have 
to be modified due to integration. 

• Users should not have to adopt a 
new language for communicating with 
the resultant integrated system, unless 
they are accessing new types of infor- 
mation. 

♦ Resources should be able to be inte- 
grated independently, and the mappings 
that result should not have to change 
when additional resources arc inte- 
grated. 

These principles are incorporated in 
an integration tool for assisting an ad- 
ministrator in integrating a resource, 
and a transaction tool for providing us- 
ers and applications with access to the 
integrated resources. The integration 
tool uses an extensive set of semantic 
properties to represent an information 
resource declaratively within the global 
schema and to construct bidirectional 
mappings between the resource and the 
global schema. The transaction tool uses 
the mappings to translate queries and 
updates written against any local sche- 
ma into the appropriate form for each 
information resource. These tools con- 
stitute part of the semantic services of 
Carnot, 1 under development at the 
Microelectronics and Computer Tech- 
nology Corporation. Carnot will enable 
development of open applications that 
can be tightly integrated with informa- 
tion stored on existing, closed systems. 
The semantic service layer of Carnot 



Let a^, /=1.2 n denote the at- 
tributes of concept £, in a local sche- 
ma. E ( Is the domain of the attributes, 
that is. the entity, relationship, rela- 
tion, dass, or object for which the 
are defined. Let s, be the global sche- 
ma slot that corresponds to, or 
matches, a^. 

Observation 1 : The domain C i of 
slot $] is a generalization of the con- 
cept in the global schema that match- 
es fc> 

For example, the domain of the at- 
tributes numberOfRooms and phone 
is the entity Masslnfo, whereas the 
domains of the corresponding Cyc 
slots numberOfRooms and 
phoneNumber are the frames 
HumanOceupyingStructure and 
Agent, respectively. These are gener- 
alizations of Cyc's Lodging, which is 
the frame whose semantics most 
closely corresponds to Masslnfo. 

As we match each of the attributes 
of Ej, we compute the common sub- 
domain of the domains of their corre- 
sponding slots. The resulting com- 
mon subdomains, although still 



generalizations of E h approximate it 
more and more closely. 

Observation 2: The "best" match 
for E, is n , Cj t the most general 
common sub-domain (greatest lower 
bound (n the generalization hierarchy) 
of the slot domains. 

In the example above, the most 
general common subdomain of 
HumanOceupyingStructure and Agent 
is ServiceOrganization, a generaliza- 
tion of Lodging. This would be sug- 
gested as the approximate match for 
Masslnfo. If no other attributes are 
matched, this would also be the best 
match that could be determined auto- 
matically for Masslnfo. 

TTie greatest lower bound might not 
exist as a single frame in the global 
schema, however; it might be a set of 
frames. For example, the greatest 
lower bound would be the set 
{HumanOceupyingStructure Agent} if 
the frame ServiceOrganization did not 
exist. In such a case, a frame would 
be created in the Cyc knowledge base 
with the frames in the set listed as its 
generalizations. 



Table 2. Templates for building articulation axioms. 



Cyc Concept Schema Concept 

Represented Represented Articulation Axiom Template 



Notes: These axioms assume that global entity C t matches local entity £,. iO denotes 
instanceOf, l.C denotes the local schema context, and specs denotes Cyc's subclass 
relation. 



As slot As attribute, a ist(G iO(x C,) ist{LC iO(x £,) 

*s(xy)) *a(xy)) 

As slot As class ist(G iO(x C,) c=> isi(LC iO(x y)) 

*s(.ry)) 

As slot As relationship, R ist(G iO(x C,) <=> ist(LC iO(x E x ) 

*'0(y C 2 ) A /0(v E 2 ) 

A s(xy)) *iO(zR) 
"r^xz) 
A r 2 (zy)) 

As category As attribute ist{G iO(x C\) ist(LC iO(x £,) 

*specs(C t CJ) *a{x E 7 )) 

As category As class ist(G iO(x C 2 ) ist(LC iO(x E 2 ) 

*specs(C t C 2 )) *subclasses{E x £,)) 
As category As relationship ist{G iO(x C) <=> isi{LC iO(x E 7 ) 

*specs(C t CJ) *tO(y E t ) 

*iO(z R) 
A r t (yz) 
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provides facilities to specify and main- 
tain the semantics of an organization's 
integrated information resources. 

A major problem we have not yet 
solved is how to integrate the results 
returned from multiresource queries. 
This integration requires both the types 
and the formats of the results. Unfortu- 
nately, attribute values are not usually 
typed objects in databases, and only 
their format is typically specified in schc- 
mas. The knowledge representation lan- 
guage used for the global schema is 
strongly typed, however, and provides a 
basis for extending the articulation axi- 
oms to be used for mapping and inte- 
grating results. We are now making this 
extension. 

Another problem involves the size of 
the global schema. Because it is signifi- 
cantly larger than the strict set of frames 
involved in integrating the databases, a 
user can issue more-general queries than 
if a merged schema were used. Howev- 
er, its large size also makes it difficult to 
use. 

Through this article, we have shown 
that a user or an application can contin- 
ue to use a familiar local schema and 
still benefit from resource integration. 
In addition, we are developing a graph- 
ical entity-relationship representation 
of the global schema and an intelligent 
interface for specifying queries with glo- 
bal semantics. ■ 
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